Skip to content

Conversation

@wForget
Copy link
Member

@wForget wForget commented Apr 29, 2021

What changes were proposed in this pull request?

Remove the use of guava in order to upgrade guava version to 27.

Why are the changes needed?

Hadoop 3.2.2 uses Guava 27, the change is for the guava version upgrade.

Does this PR introduce any user-facing change?

no

How was this patch tested?

Modify the guava version to 27.0-jre, and then compile.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, avoiding Guava in favor of the JDK classes.
Is that all the usage of com.google.common.base.Objects ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It isn't super important here I think, but does this result in the same string?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The results are different. Using Objects to return is like "AppShuffleId{appId=appId, shuffleId=100}", using ToStringBuilder to return is like "RemoteBlockPushResolver.AppShuffleId[appId=appId,shuffleId=100]". Will it cause some problems?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like with hash function changes, it shouldn't matter to programs. But if some program did rely on it, directly or accidentally, this might break. It's a tough call - how much is the change worth? overall it's an OK improvement but yeah I'm hesitant for just this reason. It's more the hash change than this one.

@sunchao
Copy link
Member

sunchao commented Apr 29, 2021

@wForget Spark master branch has already moved to use shaded Hadoop client by default (see SPARK-33212) which effectively isolated itself from the Guava version on the Hadoop side. Did you actually see a Guava conflict issue?

@dongjoon-hyun
Copy link
Member

+1 for @sunchao 's comment. Also, Apache Spark 3.2 is moving toward to Hadoop 3.3.x.

@srowen
Copy link
Member

srowen commented Apr 29, 2021

It may be unnecessary for the reason above; it still probably wouldn't hurt to just move these to standard JDK classes. I have a little bit of worry about changing behavior, with a possibly different hash or toString, though

@sunchao
Copy link
Member

sunchao commented Apr 29, 2021

@srowen yes agreed - it's better to avoid Guava usage in general if it's not necessary.

@dongjoon-hyun
Copy link
Member

Ya, different hash bites us always; at Scala version changes and at Spark versions change.
For this case, I'm not sure, but I'll leave this up to your decisions, @srowen and @sunchao .

BTW, it seems that we need to revise the wrong title and PR description about Hadoop 3.2.2. Could you make this PR neutral from Hadoop, @wForget .

@wForget
Copy link
Member Author

wForget commented Apr 30, 2021

@sunchao @dongjoon-hyun @srowen
Sorry, my description here is not accurate. The conflict caused by the program introduced multiple versions of guava, I tried to modify the guava version to 27 and found a compilation problem.

@wForget wForget changed the title [SPARK-35270][SQL][CORE] Remove the use of guava to fix Hadoop 3.2.2 guava conflict. [SPARK-35270][SQL][CORE] Remove the use of guava in order to upgrade guava version to 27 Apr 30, 2021
@HyukjinKwon
Copy link
Member

@wForget can you enable GitHub Actions in your forked repository? https://github.com/apache/spark/pull/32395/checks?check_run_id=2465058510

@wForget wForget force-pushed the master-gauva-compatible branch from 28df0ec to 46e1eee Compare May 6, 2021 02:05
@wForget
Copy link
Member Author

wForget commented May 6, 2021

@HyukjinKwon I have enabled it, how to rerun these checks?

@HyukjinKwon
Copy link
Member

Did you do something like #32400 (comment) too? If it's done, feel free to rebase which should retrigger the test.

@wForget wForget force-pushed the master-gauva-compatible branch from 46e1eee to d37c843 Compare May 6, 2021 03:47
@pan3793
Copy link
Member

pan3793 commented May 31, 2021

Seems spark-core already shaded guava, and for Hadoop 3.2, since spark already moved to Hadoop Shaded Client, I only see Curator depends on guava, from https://cwiki.apache.org/confluence/display/CURATOR/TN13 , I think it's ok to bundle a high version of guava in Spark hadoop-3.2 binary dist?

@srowen
Copy link
Member

srowen commented Jun 1, 2021

I think the concern about changing behavior still stands?

@pan3793
Copy link
Member

pan3793 commented Jul 26, 2021

any update?

@github-actions
Copy link

github-actions bot commented Nov 4, 2021

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Nov 4, 2021
@github-actions github-actions bot closed this Nov 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants